Using a Probabilistic Model of Context to Detect Word Obfuscation
نویسندگان
چکیده
This paper proposes a distributional model of word use and word meaning which is derived purely from a body of text, and then applies this model to determine whether certain words are used in or out of context. We suggest that we can view the contexts of words as multinomially distributed random variables. We illustrate how using this basic idea, we can formulate the problem of detecting whether or not a word is used in context as a likelihood ratio test. We also define a measure of semantic relatedness between a word and its context using the same model. We assume that words that typically appear together are related, and thus have similar probability distributions and that words used in an unusual way will have probability distributions which are dissimilar from those of their surrounding context. The relatedness of a word to its context is based on Kullback-Leibler divergence between probability distributions assigned to the constituent words in the given sentence. We employed our methods on a defense-oriented application where certain words are substituted with other words in an intercepted communication.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملASIC design protection against reverse engineering during the fabrication process using automatic netlist obfuscation design flow
Fab-less business model in semiconductor industry has led to serious concerns about trustworthy hardware. In untrusted foundries and manufacturing companies, submitted layout may be analyzed and reverse engineered to steal the information of a design or insert malicious Trojans. Understanding the netlist topology is the ultimate goal of the reverse engineering process. In this paper, we propose...
متن کاملA Model of E-Loyalty and Word-Of-Mouth based on e-trust in E-banking services (Case Study: Mellat Bank)
Customers extend robust trust to a business when they believe the business puts their interests first. Good experience of banking services and recommendations of other customers can increase trust. Loyalty and Word of mouth (WOM) is accepted as key factors successes of marketing. This paper seeks to discover the affecting factors on positive word of mouth and loyalty based on trust enhancement ...
متن کاملFirst Language Activation during Second Language Lexical Processing in a Sentential Context
Lexicalization-patterns, the way words are mapped onto concepts, differ from one language to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008